This paper prsesnts a reduced swing signal data transmission method for the bus architectures in VLSIs, which consists of small size bus drivers of inverters, dual rail transmission lines, termination resistors and sense amplifiers for regenerating signal swing. The optimum value of signal swing and driving capacity of sense amplifier are given as functions of transmission line capacitance based on a criterion of areadelay2 for guideline. Using results of analysis, we propose a self-controlled data transmission module for the optimum reduced swing signal. Applying the method to a 32bit bus architecture, it is shown that total area, cycle time and total power consumption are 66,070[µm2], 0.90[ns], 32.2[mW], respectively, while those are 284,000[µm2], 1.12[ns], 173.4[mW], respectively, in the conventional chained buffer module. The proposed method is less noisy than the conventional chained buffer method.
Toru NAKURA Makoto IKEDA Kunihiro ASADA
This paper demonstrates a power supply noise reduction using on-board stubs. A quarter-length stub attached to the power supply line of an LSI chip works as a band-eliminate filter, and suppresses the power supply noise of the designed frequency. Preliminary experiments show that 87% of the designed frequency noise component is suppressed when stub patterns are written on a power supply area on a PCB board for a 1.25 GHz operating LSI. The results show the possibility of the stub on-chip integration when the operating frequency of LSIs becomes higher and the stub length becomes shorter.
Taisuke KAZAMA Makoto IKEDA Kunihiro ASADA
We propose a shot reduction technique of character projection (CP) Electron Beam Direct Writing (EBDW) using combined cell stencil (CCS) or the advanced process technology. CP EBDW is expected both to reduce mask costs and to realize quick turn around time. One of major issue of the conventional CP EBDW, however, is a throughput of lithography. The throughput is determined by numbers of shots, which are proportional to numbers of cell instances in LSIs. The conventional shot reduction techniques focus on optimization of cell stencil extraction, without any modifications on designed LSI mask patterns. The proposed technique employs the proposed combined cell stencil, with proposed modified design flow, for further shot reduction. We demonstrate 22.4% shot reduction within 4.3% area increase for a microprocessor and 28.6% shot reduction for IWLS benchmarks compared with the conventional technique.
Hiroaki YOSHIDA Makoto IKEDA Kunihiro ASADA
This paper presents a structural approach for synthesizing arbitrary multi-output multi-stage static CMOS circuits at the transistor level, targeting the reduction of transistor counts. To make the problem tractable, the solution space is restricted to the circuit structures which can be obtained by performing algebraic transformations on an arbitrary prime-and-irredundant two-level circuit. The proposed algorithm is guaranteed to find the optimal solution within the solution space. The circuit structures are implicitly enumerated via structural transformations on a single graph structure, then a dynamic-programming based algorithm efficiently finds the minimum solution among them. Experimental results on a benchmark suite targeting standard cell implementations demonstrate the feasibility and effectiveness of the proposed approach. We also demonstrated the efficiency of the proposed algorithm by a numerical analysis on randomly-generated problems.
Toru NAKURA Makoto IKEDA Kunihiro ASADA
This paper demonstrates a feedforward active substrate noise cancelling technique using a power supply di/dt detector. Since the substrate is usually tied with the ground line with a low impedance, the substrate noise is closely related to the ground bounce which is proportional to the di/dt when inductance is dominant on the ground line impedance. Our active cancelling detects the di/dt of the power supply, and injects an anti-phase current into the substrate so that the di/dt-proportional substrate noise is cancelled out. Our first trial shows that 34% substrate noise reduction is achieved on our test circuit, and the theoretical analysis shows that the optimized canceller design will enhance the substrate noise suppression ratio up to 56%.
Ulkuhan EKINCIEL Hiroaki YAMAOKA Hiroaki YOSHIDA Makoto IKEDA Kunihiro ASADA
This paper describes the design and development of a module generator for a dual-rail PLA with embedded 2-input logic cells for 0.35 µm CMOS technology. In order to automatically generate logic-cell based PLA layouts from circuit specifications, a module generator as a design automation tool of logic-cell based PLA is developed with a structural improvement. This module generator is based on a timing-driven design methodology and consists of logic synthesis, transistor sizing and logic cell generation, stimulus generation, HDL model generation parts. This generator uses a design constraint to achieve a flexible transistor sizing in a logic cell generation part. In addition, generated logic cells can be easily adapted to a layout generator. The layout is generated by using 0.35 µm, 3-metal-layer CMOS technology. Moreover, an HDL model generator is developed to create delay behavior models easily and quickly with precise timing parameters. The design complexity which is becoming an important issue for VLSI circuits can be reduced partially and human caused errors are minimized by module generator. A PLA layout in GDS-II form and an HDL model behavior of a Boolean function which has 64-bit input, 1-bit output and 220 product term can be generated within 8 minutes on a SunUltraSPARC-III 900 MHz processor. A very short time is required to compile the module, and this makes it feasible for designers to try many different design configurations in order to get the better one.
Jinmyoung KIM Toru NAKURA Koichiro ISHIBASHI Makoto IKEDA Kunihiro ASADA
This paper presents a decoupling capacitance boosting method for the resonant supply noise reduction by fast voltage hopping of DVS systems. The proposed method utilizes a foot transistor as a switch between a conventional decoupling capacitor (decap) and GND. The switching controls of the foot transistor depending on the supply noise states achieve an effective noise reduction as well as fast settling time compared with the conventional passive decaps. The measurement results of a test chip fabricated in a 0.18 µm CMOS technology show 12X boost of effective decap value, and 65.8% supply noise reduction with 96% settling time improvement.
Tetsuya IIZUKA Makoto IKEDA Kunihiro ASADA
This paper proposes a cell layout synthesis method via Boolean Satisfiability (SAT). Cell layout synthesis problems are first transformed into SAT problems by our formulations. Our method realizes a high-speed layout synthesis for CMOS logic cells and guarantees to generate the minimum-width cells with routability under our layout styles. It considers complementary P-/N-MOSFETs individually during transistor placement, and can generate smaller width layout compared with pairing the complementary P-/N-MOSFETs case. To demonstrate the effectiveness of our SAT-based cell synthesis, we present experimental results which compare it with the 0-1 ILP-based transistor placement method and a commercial cell generation tool. The experimental results show that our SAT-based method can generate minimum-width placements in much shorter run time than the 0-1 ILP-based transistor placement method, and can generate the cell layouts of 32 static dual CMOS logic circuits in 54% run time compared with the commercial tool. Area increase of our method without compaction is only 3% compared with the commercial tool with compaction.
Mohamed ABBAS Makoto IKEDA Kunihiro ASADA
In this paper we present an on-chip noise detection circuit. In contrast with the previous works concerning on-chip noise measurement, this detector does not assume specific noise properties such as periodicity. The detector is able to continuously capture 10 nano-second time window from the measured signal with a resolution equal to 100 pico-second. The requested bandwidth of the output channel can be adjusted freely, therefore, the user can avoid the effect of on-chip parasites and the need to off-chip sophisticated monitoring tools. The detector is equipped with an on-chip programmable voltage divider, which enables measuring the high and low swing fluctuations accurately. Therefore, the detector is suitable to measure the non-periodic/single event noise for the purpose of reliability evaluation and performance modeling. The detector is implemented in a test chip using Hitachi 0.18 µm technology.
Rimon IKENO Takashi MARUYAMA Satoshi KOMATSU Tetsuya IIZUKA Makoto IKEDA Kunihiro ASADA
Character projection (CP) is a high-speed mask-less exposure technique for electron-beam direct writing (EBDW). In CP exposure of VIA layers, higher throughput is realized if more VIAs are exposed in each EB shot, but it will result in huge number of VIA characters to cover arbitrary VIA arrangements. We adopt one-dimensional VIA arrays as the basic CP character architecture to increase VIA numbers in an EB shot while saving the stencil area by superposed character arrangement. In addition, CP throughput is further improved by layout constraints on the VIA placement in the detail routing phase. Our experimental results proved the feasibility of our exposure strategy in the practical CP use in 14nm lithography.
Hiroaki YAMAOKA Hiroaki YOSHIDA Makoto IKEDA Kunihiro ASADA
This paper describes an area-efficient dual-rail array logic architecture, a logic-cell-embedded PLA (LCPLA), which has 2-input logic cells in the structure. The 2-input logic cells composed of pass-transistors can realize any 2-input Boolean function and are embedded in a dual-rail PLA. The logic cells can be designed by connecting some local wires and do not require additional transistors over logic cells of the conventional dual-rail PLA. By using the logic cells, some classes of logic functions can be implemented efficiently, so that high-speed and low-power operations are also achieved. The advantages over the conventional PLAs and standard-cell-based designs were demonstrated by using benchmark circuits, and the LCPLA is shown to be effective to reduce the number of product terms. In a structure with a 64-bit input and a 1-bit output including 220 product terms, the LCPLA achieved an area reduction by 35% compared to the conventional high-speed dual-rail PLA, and the power-delay product was reduced by 74% and 46% compared to the conventional high-speed single-rail PLA and the conventional high-speed dual-rail PLA, respectively. A test chip of this configuration was fabricated using a 0.35-µm, 3-metal-layer CMOS technology, and was verified with a functional test using a logic tester and an electron-beam tester at frequencies of up to 100 MHz with a supply voltage of 3.3 V.
Jinmyoung KIM Toru NAKURA Hidehiro TAKATA Koichiro ISHIBASHI Makoto IKEDA Kunihiro ASADA
This paper presents an on-chip resonant supply noise canceller utilizing parasitic capacitance of sleep blocks. The test chip was fabricated in a 0.18 µm CMOS process and measurement results show 43.3% and 12.5% supply noise reduction on the abrupt supply voltage switching and the abrupt wake-up of a sleep block, respectively. The proposed method requires 1.5% area overhead for four 100 k-gate blocks, which is 7.1 X noise reduction efficient comparing with the conventional decap for the same power supply noise, while achieves 47% improvement of settling time. These results make fast switching of power mode possible for dynamic voltage scaling and power gating.
Yusuke OIKE Makoto IKEDA Kunihiro ASADA
A high-speed 3-D camera has a future possibility of wide variety of application fields such as quick inspection of industrial components, observation of motion/destruction of a target object, and fast collision prevention. In this paper, a row-parallel position detector for a high-speed 3-D camera based on a light-section method is presented. In our row-parallel search method, the positions of activated pixels are quickly detected by a row-parallel search circuit in pixel and a row-parallel address acquisition of O(log N) cycles in N-pixel horizontal resolution. The architecture keeps high-speed position detection in high pixel resolution. We have designed and fabricated the prototype position sensor with a 12816 pixel array in 0.35 µm CMOS process. The measurement results show it achieves quick activated-position acquisition of 450 ns for "beyond-real-time" 3-D imaging and visual feedback. The high-speed position detection of the scanning sheet beam is demonstrated.
This paper presents the optimal implementation methods for 256-bit elliptic curve digital signature algorithm (ECDSA) signature generation processors with high speed Montgomery multipliers. We have explored the radix of the data path of the Montgomery multiplier from 2-bit to 256-bit operation and proposed the use of pipelined Montgomery multipliers for signature generation speed, area, and energy optimization. The key factor in the design optimization is how to perform modular multiplication. The high radix Montgomery multiplier is known to be an efficient implementation for high-speed modular multiplication. We have implemented ECDSA signature generation processors with high radix Montgomery multipliers using 65-nm SOTB CMOS technology. Post-layout results show that the fastest ECDSA signature generation time of 63.5µs with radix-256-bit, a two-module four-streams pipeline architecture, and an area of 0.365mm2 (which is the smallest) with a radix-16-bit zero-pipeline architecture, and the smallest signature generation energy of 9.51µJ with radix-256-bit zero-pipeline architecture.
Shingo MANDAI Taihei MOMMA Makoto IKEDA Kunihiro ASADA
This paper presents an architecture and a circuit design of readout address compression for a high-speed 3-D range-finding image sensor using the light-section method. We utilize a kind of variable-length code which is modified to suit the 3-D range-finder. The best compression rate by the proposed compression technique is 33.3%. The worst compression and the average compression rate is 56.4% and 42.4%, respectively, when we simulated the effectivity by using the example of measured sheet scans. We also show the measurement result of the fabricated image sensor with the address compression.
Kunihiro ASADA Makoto IKEDA Satoshi KOMATSU
This paper summarizes power reduction methods applicable for VLSI bus systems in terms of reduction of signal swing, effective capacitance reduction and reduction of signal transition, which have been studied in authors' research group. In each method the basic concept is reviewed quickly along with some examples of its application. A future perspective is also described in conclusion.
Tetsuya IIZUKA Jaehyun JEONG Toru NAKURA Makoto IKEDA Kunihiro ASADA
This paper proposes an all-digital process variability monitor which utilizes a simple buffer ring with a pulse counter. The proposed circuit monitors the process variability according to a count number of a single pulse which propagates on the buffer ring and a fixed logic level after the pulse vanishes. The proposed circuit has been fabricated in 65 nm CMOS process and the measurement results demonstrate that we can monitor the PMOS and NMOS variabilities independently using the proposed monitoring circuit. The proposed monitoring technique is suitable not only for the on-chip process variability monitoring but also for the in-field monitoring of aging effects such as negative/positive bias instability (NBTI/PBTI).
Benjamin STEFAN DEVLIN Toru NAKURA Makoto IKEDA Kunihiro ASADA
We detail a self synchronous field programmable gate array (SSFPGA) with dual-pipeline (DP) architecture to conceal pre-charge time for dynamic logic, and its throughput optimization by using pipeline alignment implemented on benchmark circuits. A self synchronous LUT (SSLUT) consists of a three input tree-type structure with 8 bits of SRAM for programming. A self synchronous switch box (SSSB) consists of both pass transistors and buffers to route signals, with 12 bits of SRAM. One common block with one SSLUT and one SSSB occupies 2.2 Mλ2 area with 35 bits of SRAM, and the prototype SSFPGA with 3430 (1020) blocks is designed and fabricated using 65 nm CMOS. Measured results show at 1.2 V 430 MHz and 647 MHz operation for a 3 bit ripple carry adder, without and with throughput optimization, respectively. We find that using the proposed pipeline alignment techniques we can perform at maximum throughput of 647 MHz in various benchmarks on the SSFPGA. We demonstrate up to 56.1 times throughput improvement with our pipeline alignment techniques. The pipeline alignment is carried out within the number of logic elements in the array and pipeline buffers in the switching matrix.
Jinmyoung KIM Toru NAKURA Hidehiro TAKATA Koichiro ISHIBASHI Makoto IKEDA Kunihiro ASADA
Switched parasitic capacitors of sleep blocks with a tri-mode power gating structure are implemented to reduce on-chip resonant supply noise in 1.2 V, 65 nm standard CMOS process. The tri-mode power gating structure makes it possible to store charge into the parasitic capacitance of the power gated blocks. The proposed method achieves 53.1% and 57.9% noise reduction for wake-up noise and 130 MHz periodic supply noise, respectively. It also realizes noise cancelling without discharging time before using parasitic capacitors of sleep blocks, and shows 8.4x boost of the effective capacitance value with 2.1% chip area overhead. The proposed method can save the chip area for reducing resonant supply noise more effectively.
Toru NAKURA Shingo MANDAI Makoto IKEDA Kunihiro ASADA
This paper presents a Time Difference Amplifier (TDA) that amplifies the input time difference into the output time difference. Cross coupled chains of variable delay cells with the same number of stages are applicable for TDA, and the gain is adjusted via the closed-loop control. The TDA was fabricated using 65 nm CMOS and the measurement results show that the time difference gain is 4.78 at a nominal power supply while the designed gain is 4.0. The gain is stable enough to be less than 1.4% gain shift under 10% power supply voltage fluctuation.